Font Size Independent OCR for Noori Nastaleeq
نویسندگان
چکیده
This paper presents a technique for font size independent OCR of Noori Nastaleeq. Most of the existing OCRs for Noori Nastaleeq support only a single font size. Urdu government documents, news papers, magazines and books written in Noori Nastaleeq font style, has varying range of font sizes. The presented technique in this paper gives support for the font size independence for Noori Nastaleeq OCR, which makes the existing OCR [5] to recognize the Noori Nastaleeq text of different font sizes. The presented technique resizes the input ligature using Splines. Outline of the input ligature is extracted and then scaling factor is applied according to the font size to make the ligature outline to the size at which the existing OCR is trained. The scaled outline is then converted into the image form so that the OCR can recognize it. The presented technique is tested on the Urdu single character ligatures and the recognition rate is 98% for the manually generated data and 96% for the data scanned from different books and magazines. Keywords— OCR, Noori Nastaleeq, Splines.
منابع مشابه
Handwritten Nastaleeq Script Recognition with BLSTM-CTC and ANFIS method
A recurrent neural network (RNN) has been successfully applied for recognition of cursive handwritten documents, both in English and Arabic scripts. Ability of RNNs to model context in sequence data like speech and text makes them a suitable candidate to develop OCR systems for printed Nastaleeq scripts (including Nastaleeq for which no OCR system is available to date). In this work, we have pr...
متن کاملOptical Font Recognition from Projection Profiles
• Recognition of logical document structures [1], where knowledge of the font used in a word, line, or text block may be useful for defining its logical label (chapter title, section title or paragraph). • Document reproduction, where knowledge of the font is necessary in order to reproduce (reprint) the document. • Document indexing and information retrieval, where word indexes are generally p...
متن کاملFONT DISCRIMINATIO USING FRACTAL DIMENSIONS
One of the related problems of OCR systems is discrimination of fonts in machine printed document images. This task improves performance of general OCR systems. Proposed methods in this paper are based on various fractal dimensions for font discrimination. First, some predefined fractal dimensions were combined with directional methods to enhance font differentiation. Then, a novel fractal dime...
متن کاملFont and Size Identification in Telugu Printed Document
Telugu is the official language derived from ancient Brahmi script and also one of the oldest and popular languages of India, spoken by more than 66 million people, especially in South India. While a large amount of work has been developed for font and size identification of English and other languages, relatively not much work has been reported on the development of OCR system for Telugu text....
متن کاملA Complete Tamil Optical Character Recognition System
The aim of the present work is to recognise printed Tamil text. Though commercial Optical Character Recognition (OCR) packages are available in the market for Roman Script, not much work has been done in the field of OCR for Indian languages. Indian scripts usually have a large number of symbols and hence, recognition is a challenging task. In the current context, a complete OCR in printed Tami...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009